# Performance Comparison of Parallel Asynchronous Self Time Adder using Modified GDI Logic for Low Power & Low Area Applications

<sup>1</sup>M.Sivakumar

Research Scholar/ECE Department, SCSVMV University, Kanchipuram, <sup>2</sup>S.Omkumar Associate Professor/ECE Department, SCSVMV University, Kanchipuram,

#### Abstract

As per the latest research works are concerned the CMOS technology (regarded as Conventional Complementary Static) which could be used as the design to a parallel for the adder circuit which is self-timed for which NMOS and PMOS are of equal count. The transistors required are almost in huge number for testing the performance of digital circuits and logic gates. And thus power consumption and area of cross section required are in higher amount. In such cases, the modified gate with diffusion input(GDI) is thus proposed for the (PASTA) the parallel asynchronous self-time adder technique. And as per the results of the proposed are considered it offers a very less number of transitions and comparatively has a lower power consumption rate. Thus the outcome process is brought through TANNER TOOLv14. 11 to check the working.

**Key Words:** *PASTA, CsCMOS, Adder circuit, GDI logic, Tanner toolv14. 11* 

## I. INTRODUCTION

The main function of any processor could not just be a binary addition. Adders are most probably known to be designed for the requirements of synchronous circuits. Quantization time does not play an important role for asynchronous circuits. assume Asynchronous circuits do not anv quantization of time. Due to this efficient characteristic they are known to overcome several defects arising due to clock timing problems. In principle, logic flow in asynchronous circuits is controlled by а request-acknowledgment handshaking protocol to establish a pipeline in the absence of clocks. Explicit handshaking blocks for small elements, such as bit adders, are expensive. Therefore, it is implicitly and efficiently managed using dual-rail carry propagation in adders. An acknowledgement is also provided for a valid-dual output from a single-bit adder block. Hence these asynchronous adders are all based on pipelined operation or dual encoding of all the other signals and dual-rail encoding representations. While these constructs add robustness to circuit designs, they also introduce significant overhead to the average case performance benefits of asynchronous adders.

A asynchronous parallel self timed adder (PASTA). The design of PASTA is regular and uses half adders along with multipliers requiring minimum interconnections. The design work in parallel manner for independent carry chain blocks. The implementation in this brief is unique as it employs feedback through XOR logic gates to constitute а single-rail cyclic asynchronous sequential adder .Cyclic circuits can be more resource efficient than their acyclic counterparts . On the other hand, wave pipelining (or maximal rate pipelining) is a technique that can apply pipelined inputs before the outputs are stabilized. The proposed circuit manages automatic single-rail pipelining of the carry inputs separated by propagation and inertial delays of the gates in the circuit path. Thus, it is effectively a single-rail wave-pipelined approach and quite different from conventional pipelined adders using dual-rail encoding to implicitly represent the pipelining of carry signals.

The GDI cell consists of G (common gate input of NMOS and PMOS transistors), N (input to the source/drain of NMOS transistor) and P (input to the source/drain of PMOS transistor). The GDI technique was very efficient for both sequential and combinatorial logic implementation in comparatively old CMOS methods.[10] There are several variety of circuits like combinational circuits namely comparators, multipliers, adders, etc.., which were executed from 18nm to 180nm which demonstrates the power reduction of 40%. GDI Flip-Flop (FF) executions in the 350nm and 18nm technologies. The main aim of the paper is to design the reduced parallel asynchronous self time adder based on GDI logic to reduce the power consumption and number of transistors than the all other existing adder circuits.[11]

#### LITERATURE REVIEW

D. Geer et all. [1] proposed Clockless chips which offer an advantage over their synchronous counterparts because they efficiently use cycle times. Synchronous processors must make sure they can complete each part of a computation in one clock tick. Thus, in addition to running their logic, the chips must add cycle time to compensate for how much longer it takes to run some operations than to run average operations (worst case – average case), variations in clock operations (jitter and skew), and manufacturing and environmental irregularities.

As S. Nowick, has discussed about a general method [2] to design an asynchronous components for data path generally called speculative completion. It has several advantages like bundle of data approach and also the use of single-railed synchronous data paths providing a path for early completion. This method as a result of several case studies is applied to High performance parallel BLC Through careful gate-level analysis, ADDER. performance improvements of up to 30% over a synchronous implementation comparable are expected.

F.-C. Cheng et all [3] proposed a self-timed carry-lookahead adder in which the logic complexity is a linear function of n, the number of inputs, and the average computation time is proportional to the logarithm of the logarithm of n. To the best of our knowledge, our adder has the best area-time efficiency which is \_.n log log n.. An economic implementation of this adder in CMOS technology is also presented. SPICE simulation results show that, based on random inputs, our 32-bit self-timed carrylookahead adder is 2.39 and 1.42 times faster than its synchronous counterpart and self-timed ripple-carry adder, respectively; and, based on statistical data gathered from a 32-bit ARM simulator, it is 1.99 and 1.83 times faster than its synchronous counterpart and self-timed ripple-carry adder, respectively.

P. Choudhury et all [4] presented hardware architecture to perform the basic arithmetic operation addition using Cellular Automata (CA). This age old problem of addition were previously solved by ripple circuit or carry look ahead circuit or by using a combination of them. Each of these circuits is purely combinational in nature and their complexity is centered on the number of logic gates and the associated gate delays. On the contrary, in our CA based design the complexity is mainly centered on the number of clock cycles required to finish the computation instead of the gate delays.

M. D. Riedel, et all [4] proposed cyclic combinational circuits. Digital circuits are called combinational if they are memoryless: they have outputs that depend only on the current values of the inputs. Combinational circuits are generally thought of as acyclic (i.e., feed-forward) structures. And yet, cyclic circuits can be combinational. Cycles sometimes occur in designs synthesized from highlevel descriptions. Feedback in such cases is carefully contrived, typically occurring when functional units are connected in a cyclic topology. Although the premise of cycles in combinational circuits has been accepted, and analysis techniques have been proposed, no one has attempted the synthesis of circuits with feedback at the logic level.

W. Liu, et all [5] discussed about Wave pipelining (also known as maximal rate pipelining) is a timing methodology used in digital systems to increase the number of effective pipelined stages without increasing the number of physical registers in the system. Using this technique, new data are applied to the inputs of a combinational block before the previous outputs are available, thus effectively pipelining the combinational logic. Achieving a high degree of wave pipelining in CMOS technology requires careful study of delay balancing technique involving circuit design, layout method, and testing structure.

## III. EXISTING PASTA USING CSCMOS TECHNIQUE

Existing parallel asynchronous self time adder (PASTA) is designed using complementary static CMOS logic for half carry generation and half sum generation, 2:1 multiplexer and complete detection units (CDU) as shown in fig.1. Iterative phase is used to perform the addition operation. For example if we took a=1101, b=1111 & cin=0 means no carry propagation. It means that directly input a & b values are added and generated the sum and carry as outputs. If sel=0 means carry input cin=1, only cin=1 up to first iteration. After that sel=1 and input cin set as 0. From that if the selection line is zero means first path (a, b) values are gives to half adder and selection line is 1 means feedback path is enabled so cin and previous output are sent to half adder. We have to check all the carry values, not zero means this Iteration process is carried out up to all the carry values are set as zero. This kind of adder is called as parallel asynchronous self time adder[7].



Fig.3.1 Conventional n-bit PASTA using static CMOS logic.

Structure of conventional half adder is shown in Fig.3. Conventional parallel asynchronous self time adder is reduced by using only half adder instead of full adder and iterative phase. Multiplexer having 2 main inputs (i0 and i1) and one control input (sel) to produce the only one output (x). Multiplexer circuits consists of 6 PMOS and 6 NMOS which is connected as shown in Fig.2 [8]. For example, if i0=1, i1=0 and sel=0 means, pmos1 is off and nmos3 is on because i0=1 and invert sel =1 and nmos4 is on which is connected to ground so ouput is zero. This output is connected to the inverter so x=1. Selection line zero means 10 values are given to output and selection line is 1 means i1 values are given to the output. The functionality of the multiplexer is verified using static CMOS logic.

In the existing half adder circuits consists of 5 PMOS and 5 NMOS for sum generation and 3 PMOS and 3 NMOS for carry generation. Totally 16 transistors are required to design the existing half adder circuits using static CMOS technology. If the input x0 is 0 and x1 is one, PMOS\_3 and NMOS\_2 is on. So PMOS\_3 value is zero, which enter into inverter, Hence the final half adder sum output is one. Similarly for carry generation for same inputs is checked, PMOS1 and PMOS2 is on which is connected to vdd so the output is 1. But it is connected to the inverter so final carry output is 0 when x0=0 and x1=1. Accordingly, all other inputs are processed to produce the correct output.



Fig.3.2 Schematic diagram of conventional multiplexer using Static CMOS technique.



Fig.3.3 Schematic diagram of existing half adder using Static CMOS technique.



Fig 3.4 Circuit diagram of existing CDU using Static CMOS technique.

The conventional complete detection unit (CDU) gate is designed using static CMOS method as shown

in Fig.4. The output of CDU is generated based on NOR operation. If the sel input and all the carry inputs are zero means the output of CDU is one [8]. Otherwise the output is zero. Further to decrease the power consumption and a number of transistors in the conventional PASTA, the optimum half adder structures are established in the proposed systems.

# IV. PROPOSED PASTA ADDER USING MODIFIED GDI LOGIC

# A. Design of 4 Bit Parallel Self-Timed Adders

Design is carried out in a 250nm technology with a supply voltage of 5 volts. A CMOS implementation for the recursive circuit is shown below. For multiplexers and AND gates we have used TSMC library implementations while for the XOR gate we have used the faster ten transistor implementation based on transmission gate XOR to match the delay with AND gates [9].

The completion detection following is negated to obtain an active high completion signal (TERM). This requires a large fan-in *n*-input NOR gate. Therefore, an alternative more practical pseudonMOS ratioed design is used [12].The resulting design is shown in Figure. Using the pseudo-nMOS design, the completion unit avoids the high fan-in problem as all the connections are parallel. The pMOS transistor connected to *V*DD of this radioed design acts as a load register, resulting in static current drain when some of the nMOS transistors are on simultaneously.

In addition to the Ci s, the negative of SEL signal is also included for the TERM signal to ensure that the completion cannot be accidentally turned on during the initial selection phase of the actual inputs. It also prevents the pMOS pull up transistor from being always on. Hence, static current will only be flowing for the duration of the actual computation.



Fig 4.1: circuit diagram of 4bit self-timed adder

#### Circuit operation

Selection input for two-input multiplexers corresponds to the Req handshake signal and will be a single 0 to 1 transition denoted by SEL. It will initially select the actual operands during SEL = 0 and will switch to feedback/carry paths for subsequent iterations using SEL = 1. The feedback path from the HAs enables the multiple iterations to continue until the completion when all carry signals will assume zero values.

As per the schematic above:

Vhigh=5v (represents logic level 1)

Vlow=0v (represents logic level 0)

| Initially selection line=0        | for a pulsewidth of 148ns |
|-----------------------------------|---------------------------|
| after this it is turned to SEL=1. |                           |
| (A0,A1,A2An                       | .B0,B1,B2Bn)              |
| represents primary operands.      |                           |
| So,S1,S2,                         | Sn.(sum)                  |
| Cin,C1,C2                         | CN.(carry)                |
|                                   |                           |

TERM == Output of completion detection unit.

#### MULTIPLEXERS AND HALF ADDERS



Fig 4.2 circuit diagram of a multiplexer and half adder unit



Fig 4.3: Circuit diagram of a multiplexer

Here we are using multiplexers to reduce the number of interconnections.

For a PMOS transistor: L=2U,W=18U NMOS transistor: L=2U,W=10U



Fig 4.4: schematic circuit of a multiplexer



Fig 4.5: schematic circuit of a single bit carry module



Fig 4.6: Single bit sum module



Fig 4.7schematic circuit of a single bit sum module

Completion detection unit SUPPLY VOLTAGE =5V L=5U W=2U



Fig 4.8 Schematic of completion unit



Fig 4.9 Circuit diagram of completion detection unit

Completion detection unit which is nothing but a nor gate. As we know in the nor gate output exists when all the inputs are zeroes. So here when all the carries are zeroes completion unit detects the inputs ,then the iterations stops and generates the outputs as TERM.



Fig 4.10 outputs representing logical states '1' and '0'

Signal transition upwards indicates logic state '1' and when the signal falls it is logic state '0'.Here Vterm indicates the output of completion detection unit and the output is verified when CDM transition to '1' from its initial state '0'.





Fig 4.11 8bit self-timed adder





Fig 4.12 16bit selftimed adder

# V. SIMULATION RESULTS

The Simulation process is carried out by Tanner tool v14.11 to check the functionality of the existing versus proposed PASTA adder's circuits. The graph plots between Maximum power versus time for all existing and proposed circuits. The results are shown maximum power reduction than the conventional PASTA technique. From the results, the proposed 4- bit PASTA offers 65% and 8-bit PASTA offers 52.6% maximum power reduction than the conventional PASTA technique. The modified 16-bit PASTA provides 45% maximum power reduction when compared to the existing PASTA. The following figures shows the graphs of 4bit, 8 bit and 16 bit PASTA circuit's average power consumed by the circuits.

The Existing PASTA adders using conventional CS CMOS techniques power analysis are shown below.



Fig 5.1 : 4-bit adder power output



Fig 5.2 : 8-bit adder power output



Fig 5.3 : 16-bit adder power output

The proposed PASTA adders using Modified GDI logic techniques for achieving maximum power reduction than the conventional technique shown below.



Fig 5.4 : 4-bit adder power output







Fig 5.6 16-bit adder power output



Fig 5.7 Comparison between existing and proposed PASTA for Power optimization

From the figure 5.7, the proposed 8-bit PASTA offers 52.6% maximum power reduction than the conventional PASTA technique. The modified 16-bit PASTA provides 45% maximum power reduction when compared to the existing PASTA.



# Fig 5.8 Comparison between existing and proposed PASTA for area utilization

From the figure 5.8, the area utilization of different type of adder with different size; the proposed parallel asynchronous self time adder offers 41%, 43%, 42.9% area reduction for 4-bit, 8-bit and 16bit adders than the existing parallel asynchronous self time adder. To reduce the number of transistor, GDI logic is used in the proposed PASTA in the transistor level. Also logic level simplification is performed to reduce the number of gate.

#### **VI. CONCLUSION**

the Power performance and area Thus. utilization of existing vs proposed comparison of 4, 8 and 16 Bit PASTA adder using GDI logic for Low power and area applications. An efficient implementation of a PASTA. Initially, the theoretical foundation for a single-rail wave-pipelined adder is established. Subsequently, the architectural design and CMOS implementations are presented. The design achieves a very simple *n*-bit adder that is area and interconnection-wise equivalent to the simplest adder namely the RCA. Moreover, the circuit works in a parallel manner for independent carry chains, and thus achieves logarithmic average time performance over random input values. The completion detection unit for the proposed adder is also practical and efficient. Simulation results are used to verify the advantages of the proposed approach.

#### REFERENCES

- D.Geer, "Is it time for clockless chips? [Asynchronous processor chips]," IEEE Comput., vol. 38, no. 3, pp. 18–19, Mar. 2005.
- [2] S. Nowick, "Design of a low-latency asynchronous adder using speculative completion," IEEE Process

Computation. Digital Technology, vol. 143, no. 5, pp. 301–307, Sep. 1996.

- [3] F.-C. Cheng, S. H. Unger, and M. Theobald, "Self-timed carry-lookahead adders," IEEE Trans. Comput., vol. 49, no. 7, pp. 659–672, Jul. 2000.
- [4] P. Choudhury, S. Sahoo, and M. Chakraborty, "Implementation of basic arithmetic operations using cellular automaton," in *Proc. ICIT*, 2008, pp. 79–80.
- [5] M. D. Riedel, "Cyclic combinational circuits," Ph.D. dissertation, Dept. Comput. Sci., California Inst. Technol., Pasadena, CA, USA, May 2004.
- [6] W. Liu, C. T. Gray, D. Fan, and W. J. Farlow, "A 250-MHz wave pipelined adder in 2-μm CMOS," IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1117–1128, Sep. 1994.
- [7] J. Sparsø and S. Furber, Principles of Asynchronous Circuit Design. Boston, MA, USA: Kluwer Academic, 2001.
- [8] Mohammed Ziaur Rahman, Lindsay Kleeman and Mohammad Ashfak Habib, "Recursive Approach to the Design of a Parallel Self-Timed Adder", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2014.
- [9] M. Z. Rahman and L. Kleeman, "A delay matched approach for the design of asynchronous sequential circuits," Dept. Comput. Syst. Technol., Univ. Malaya, Kuala Lumpur, Malaysia, Tech. Rep. 05042013, 2013.
- [10] Morgenshtein, A., Fish, A. and Wagner, I.A., "Gatediffusion input (GDI): a power-efficient method for digital combinatorial circuits", IEEE Transactions on VLSI systems, Vol. 10, No. 5, 2002.
- [11] Morgenshtein, A., Moreinis, M. and Ginosar, R., "Asynchronous gate-diffusion-input (GDI) circuits", IEEE Transactions on VLSI systems, Vol. 12, No. 8, 2004.
- [12] Arkadiy Morgenshtein, Viacheslav Yuzhaninov, Alexey Kovshilovsky and Alexander Fish, "Full-Swing Gate Diffusion Input logic-Case-study of low-power CLA adder design", Integration, theVLSI journal, Vol. 47, 62-70, 2014.
- [13] Basant Kumar, M. and Sujit Kumar, P., "Area-Delay-Power Efficient Carry-Select Adder", IEEE Transactions on Circuits and Systems-II: Express Briefs, Vol. 61, No. 6, June 2014.
- [14] Geetha Priya, M. and Baskaran, K., "Low Power Full Adder with Reduced Transistor Count". International Journal of Engineering Trends and Technology (IJETT)-Volume 4 No. 5, May 2013.
- [15] Kalavathidevi, T. and Venkatesh, C., "Gate Diffusion Input (GDI) Circuits Based Low Power VLSI Architecture for a Viterbi Decoder", Iranian Journal of Electrical and Computer Engineering (ACECR), Vol. 10, No. 2, 2011.
- [16] Kapil Mangla and Shashank Saxena, "Analysis of Different CMOS Full Adder Circuits Based on Various Parameters for Low Voltage VLSI Design", International Journal of Engineering and Technical Research (IJETR), Vol. 3, No-5, May 2015.
- [17] Kunal & Nidhi Kedia, "GDI Technique: A Power-Efficient Method for Digital Circuits", International Journal of Advanced Electrical and Electronics Engineering, (IJAEEE), Vol.1, No-3, 2012.
- [18] Parhami, B., "Computer Arithmetic: Algorithms and Hardware Designs", 2<sup>nd</sup> ed. New York, NY, USA: Oxford Univ. Press, 2010.

- [19] Parhi, K.K., "VLSI Digital Signal Processing". New York, NY, USA: Wiley, 1998.
- [20] Shiv Shankar Mishra, Adarsh Kumar Agrawal and R.K. Nagaria, "A comparative performance analysis of various CMOS design techniques for XOR and XNOR circuits". International Journal on Emerging Technologies 1(1): 1-10, 2010.